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Specific  Aims 

The  three  challenges  that  we  faced  before  are: 

1 .  To  consistently  differentiate  peak  differences  in  NA  detected  by  Surface-Enhance  Laser 
Desorption/Ionization  (SELDI)  between  the  two  groups  of  women:  breast  cancer  vs.  non¬ 
cancer. 

2.  To  identify  and  characterize  the  peak  proteins  that  are  differentially  expressed  in  cancerous 
and  non-cancerous  NA. 

3.  To  develop  software  program  that  can  consistently  identify  the  true  peak  from  noise  and 
quantify  the  intensity  of  each  peak  in  a  reproducible  fashion. 

Results 

1.  Using  SELDI  mass  spectrometry  we  are  able  to  generate  from  each  breast  cancer  specimen  a 
semi-quantitative  profile  of  proteins.  Currently,  we  have  analyzed  88  nipple  aspirates  derived 
from  cancerous  (47)  and  noncancerous  (41)  breasts.  Eleven  protein  peaks  are  shown  to  be 
statistically  different  between  the  two  groups:  cancer  vs.  non-cancer.  The  results  are 
summarized  below: 


Table:  Molecular  weights  of  protein  peaks  that  differentiate  cancerous  from  non-cancerous 
nipple  aspirates  (NA) _ _ 


Molecular 

Weight  (dalton) 

Mean  std 

Intensity  diff 

T 

P 

10370 

-5.057 

2.24 

0.0254 

11118 

-3.646 

1.61 

0.1070 

15618 

10.76 

-4.76 

<.0001 

16368 

9.422 

-4.16 

<.0001 

20868 

-5.392 

2.38 

0.0172 

21618 

-7.323 

3.24 

0.0012 

22368 

-15.19 

6.71 

<.0001 

23118 

-13.31 

5.88 

<.0001 

23868 

-7.743 

3.42 

0.0006 

24618 

-4.442 

1.96 

0.0503 

67368 

4.085 

-1.81 

0.0710 

2.  The  protein  ID  of  the  two  prominent  protein  peaks  by  protein  separation,  cleavage  and  amino 
acid  sequencing  by  mass  spectrometry  suggested  that  the  PIP  (prolactin  inducing  protein)  is 
abundant  in  nipple  aspirates  of  cancerous  breasts  and  clustering  is  abundant  in  the  nipple 
aspirates  of  noncancerous  breasts.  We  are  confident  our  collaborators,  Drs.  Kym  and 
Whitelegge  will  extend  their  expertise  in  helping  us  to  design  and  refine  the  proteomic  analysis. 
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3.  Statistical  Analysis  -  Software  program  development  and  comparison 

The  SBCC  staff  (Jeff  Gombein  &  Rita  Engelhardt,  with  input  from  Dr.  Elliot  Landaw)  has  been 

exploring  the  problem  of  identifying  peaks  from  SELDI. 

The  preliminary  analyses  we  gave  showed  that  the  standardized  protein  profiles  are  reasonably 
well  modeled  by  a  Gaussian,  even  without  scale  transformations  such  as  log  transformations.  We 
have  the  capability  to  generalize  beyond  the  classical  repeated  measure  assumptions  as  our 
model  (SAS  procedure  MIXED)  allows  for  variance  heterogeneity  and  can  allow  for  general 
patterned  within  and  between  subject  covariate  matrices. 

Since  we  plan  to  limit  the  range  of  our  protein  profile  comparisons  to  the  10,000  to  100,000 
dalton  range  and  since  we  can  adjust  size  of  the  “bins”  used  to  partition  this  range,  we  generally 
do  not  have  a  problem  with  missing  data  in  any  bin  for  any  one  profile.  In  fact,  since  intensity 
observations  are  made  approximately  every  0.2  dalton,  we  typically  have  over  500,000 
observations  per  profile.  The  large  number  of  observations  per  bin  also  helps  explain  why  a 
Gaussian  model  is  reasonable,  as  distributions  tend  toward  the  Gaussian  when  n  is  large, 
particularly  when  we  average  over  all  the  observations  in  each  bin  for  each  profile.  Moreover, 
our  model  uses  maximum  likelihood  methods  that  allow  use  of  all  the  observed  data  even  if  there 
was  missing  data  for  some  bins  in  some  samples,  provided  that  the  data  is  missing  at  random. 

Due  to  current  computer  memory  limitations,  we  then  compress  the  data  into  bins  that  are  150 
data  points  “wide”.  Since  there  are  about  5  daltons  per  data  point,  this  bin  is  about  750  daltons 
wide.  Thus,  the  current  resolution  is  only  to  about  750/2  =  375  daltons.  This  can  easily  be 
improved  with  a  more  powerful  computer  allowing  for  smaller  bins.  The  current  bin  size  on  a 
Pentium  II  compute  with  300  MHz  takes  3  minutes  to  run. 

The  results  presented  in  this  report  were  generated  through  analysis  using  relative  peak  heights 
for  calculation.  Another  approach  we  could  utilize  is  comparing  peak  area  as  criteria  for 
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analysis.  We  plan  to  calculate  the  relative  peak  area  profiles  of  cancerous  and  non-cancerous 
NA  when  a  more  powerful  computer  becomes  available. 


